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Abstract 

Factorization Machines (FM) are currently only used in a narrow range of applications and 
are not yet part of the standard machine learning toolbox, despite their great success in 
collaborative filtering and click-through rate prediction. However, Factorization Machines 
are a general model to deal with sparse and high dimensional features. Our Factorization 
Machine implementation (/astFM) provides easy access to many solvers and supports re¬ 
gression, classification and ranking tasks. Such an implementation simplihes the use of 
FM for a wide range of applications. Therefore, our implementation has the potential to 
improve understanding of the FM model and drive new development. 
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1. Introduction 

This work aims to facilitate research for matrix factorization based machine learning (ML) 
models. Factorization Machines are able to express many different latent factor models and 
are widely used for collaborative filtering tasks (Rendle, 2012b). An important advantage 
of FM is that the model equation 


rco G M, X, re G Vi G 
p p p 

y^^{x) := rco + X + X ^i)***i (1) 

2=1 2 = 1 j>i 

conforms to the standard notation for vector based ML. FM learn a factorized coefficient 
{vi,Vj) for each feature pair XiXj (eq. 1). This makes it possible to model very sparse feature 

interactions, as for example, encoding a sample as x = {•••, 0, 1 ,0, • • • , 0, 1 ,0, • • • } 
yields y^^{x) = wq + Wi + wj -|- vjvj which is equivalent to (biased) matrix factorization 
Ri,j ^ bo + bi + bj -b ufvj (Srebro et ah, 2004). Please refer to Rendle (2012b) for more 
encoding examples. FM have been the top performing model in various machine learning 
competitions (Rendle and Schmidt-Thieme, 2009; Rendle, 2012a; Bayer and Rendle, 2013) 
with different objectives (e.g. What Do You Know? Challenge^, EMI Music Hackathon^). 
fastFM includes solvers for regression, classification and ranking problems (see Table 1) and 
addresses the following needs of the research community: (i) easy interfacing for dynamic 

1. http://www.haggle.com/c/WhatDoYouKnow 

2. http://www.haggle.com/c/MusicHachathon 
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and interactive languages such as R, Python and Matlab; (ii) a Python interface allowing 
interactive work; (iii) a publicly available test suite strongly simplifying modifications or 
adding of new features; (iv) code is released under the BSD-license allowing the integration 
in (almost) any open source project. 

2. Design Overview 

The fastFM library has a multi layered software architecture (Figure 1) that separates the 
interface code from the performance critical parts (/astFM-core). The core contains the 
solvers, is written in C and can be used stand alone. Two user interfaces are available: a 
command line interface (CLI) and a Python interface. Cython (Behnel et ah, 2011) is used 
to create a Python extension from the C library. Both, the Python and C interface, serve 
as reference implementation for bindings to additional languages. 

2.1 fastFM-core 


fastFM (Py) 


Cython 

CLI 

/astFM-core (C) 


Figure 1: Library Archi¬ 
tecture 


FM are usually applied to very sparse design matrices, often with 
a sparsity over 95 %, due to their ability to model interaction 
between very high dimensional categorical features. We use the 
standard compressed row storage (CRS) matrix format as under¬ 
lying data structure and rely on the CXSparse^ library (Davis, 
2006) for fast sparse matrix / vector operations. This simpli- 
hes the code and makes memory sharing between Python and C 


straight forward. 

fastFM contains a test suite that is run on each commit to the GitHub repository via a 
continuous integration server^. Solvers are tested using state of the art techniques, such as 
Posterior Quantiles (Cook et ah, 2006) for the MCMC sampler and Finite Differences for 
the SGD based solvers. 


2.2 Solver and Loss Functions 

fastFM provides a range of solvers for all supported tasks (Table 1). The MCMC solver 
implements the Bayesian Factorization Machine model (Freudenthaler et ah, 2011) via Gibbs 
sampling. We use the pairwise Bayesian Personalized Ranking (BPR) loss (Rendle et ah, 
2009) for ranking. More details on the classification and regression solvers can be found in 
Rendle (2012b). 


Task 

Solver 

Loss 

Regression 

ALS, MCMC, SGD 

Square Loss 

Classification 

ALS, MCMC, SGD 

Probit (MAP), Probit, Sigmoid 

Ranking 

SGD 

BPR (Rendle et ah, 2009) 


Table 1: Supported solvers and tasks 


3. CXSparse is LGPL licensed. 

4. https://travis-ci.org/ibayer/fastFM-core 
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2.3 Python Interface 

The Python interface is compatible with the API of the widely-used scikit-learn library 
(Pedregosa et ah, 2011) which opens the library to a large user base. The following code 
snippet shows how to use MCMC sampling for an FM classifier and how to make predictions 
on new data. 

fm = mcmc.FMClassification(init_std=0.01, rank=8) 
y_pred = fm.fit_predict(X_train, y_train, XTest) 

fastFM. provides additional features such as warm starting a solver from a previous solution 
(see MCMC example). 

fm = als.FMRegression(init_std=0.01, rank=8, 12_reg=2) 
fm.fit(X_train, y_train) 


3. Experiments 

libFM^ is the reference implementation for FM and the only one that provides ALS and 
MCMC solver. Our experiments show, that the ALS and MCMC solver in fastFM compare 
favorable to libFM with respect to runtime (Figure 2) and are indistinguishable in terms of 
accuracy. The experiments have been conducted on the MovieLens lOM data set using the 
original split with a fixed number of 200 iterations for all experiments. The x-axis indicates 
the number of latent factors (rank), and the y-axis the runtime in seconds. The plots show 
that the runtime scales linearly with the rank for both implementations. The code snippet 


ALS MCMC 



fastFM 

libfm 


Figure 2: A runtime comparison between fastFM and libFM is shown. The evaluation is done on 
the MovieLens lOM data set. 


below shows how simple it is to write Python code that allows model inspection after 
every iteration. The induced Python function call overhead occurs only once per iteration 
and is therefore neglectable. This feature can be used for Bayesian Model Checking as 
demonstrated in Figure 3. The figure shows MCMC summary statistics for the first order 
hyper parameter Please note that the MCMC solver uses Gaussian priors for the model 
parameter (Freudenthaler et ah, 2011). 

5. http://libfm.org 


3 







Bayer 


fm = mcmc.FMRegression(n_iter=0) 

# initialize coefficients 
fm.fit_predict(X_train, y.train, X_test) 

for i in range(number_of_iterations): 

y_pred = fm.fit_predict(X_train, y_train, X_test, n_more_iter=l) 

# save, or modify (hyper) parameter 
print (fm.w_, fm.V_, fm.hyper_param_) 

Many other analyses and experiments can be realized with a few lines of Python code 
without the need to read or recompile the performance critical C code. 


Trace of Density of o^ 




Figure 3: MCMC chain analysis and convergence diagnostics example for the hyperparameter 
evaluated on the MovieLens lOM data set. 


4. Related Work 

Factorization Machines are available in the large scale machine learning libraries GraphLab 
(Low et ah, 2014) and Bidmach (Canny and Zhao, 2013). The toolkit Svdfeatures by Chen 
et al. (2012) provides a general MF model that is similar to a FM. The implementations in 
GraphLab, Bidmach and Svdfeatures only support SGD solvers and don’t provide a ranking 
loss. It’s not our objective to replace these distributed machine learning frameworks; but to 
be provide a FM implementation that is easy to use and easy to extend without sacrihcing 
performance. 
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